theme_park |>
group_by(Year, Type) |>
mutate(
Attendance = Attendance / 100000
) |>
summarise(sum = sum(Attendance)) |>
arrange(Type) |>
pivot_wider(
names_from = Type,
values_from = sum
) |>
knitr::kable(digits = 3, caption = c("Summary of Attendance for Three Types of Facilities From 2019 to 2022"))
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
| Year | Amusement/Theme Park | Museum | Water Park |
|---|---|---|---|
| 2019 | 37996.4 | 20100.8 | 5898.9 |
| 2020 | 13031.1 | 4664.5 | 2313.5 |
| 2021 | 22463.7 | 6459.0 | 3473.5 |
| 2022 | 21280.8 | 11603.3 | 4678.3 |
theme_park |>
group_by(Year) |>
plot_ly(y = ~Attendance, color = ~Year, type = "box", colors = "viridis")
theme_park|>
group_by(Region, Year) |>
summarize(attend_sum = mean(Attendance)) |>
plot_ly(x = ~Year, y = ~attend_sum, color = ~Region,
type = "scatter", mode = 'point', colors = "viridis")
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
\[H_0: \mu_{\text{Amusement/Theme Park}} = \mu_{\text{Water Park}} = \mu_{\text{Museum}} ~~ \text{vs} ~~ H_1: \text{at least two means are not equal}\]
anova_1 = aov(Attendance ~ Type, data = theme_park)
summary(anova_1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Type 2 1.127e+17 5.635e+16 105.3 <2e-16 ***
## Residuals 737 3.944e+17 5.351e+14
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a p-value of less than 2e-16, we would reject the null hypothesis. We have evidence that at least two of the means are not equal. Meaning the mean attendance among type groups is different for at least two groups.